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Abstract 


This  paper  studies  the  problem  of  obtaining  depth  information  from  focusing  and 
defocusing,  which  have  long  been  noticed  as  important  sources  of  depth  information 
for  human  and  machine  vision.  In  depth  from  focusing,  we  try  to  eliminate  the  local 
maxima  problem  which  is  the  main  source  of  inaccuracy  in  focusing;  in  depth  from 
defocusing,  a  new  computational  model  is  proposed  to  achieve  higher  accuracy. 

The  major  contributions  of  this  paper  are:  (1)  In  depth  from  focusing,  instead  of 
the  popular  Fibonacci  search  which  is  often  trapped  in  local  maxima,  we  propose  the 
combination  of  Fibonacci  search  and  curve  fitting,  which  leads  to  an  unprecedent¬ 
edly  accurate  result;  (2)  New  model  of  the  blurring  effect  which  takes  the  geometric 
blurring  as  well  as  the  imaging  blurring  into  consideration,  and  the  calibration  of 
the  blurring  model;  (3)  In  spectrogram-based  depth  from  defocusing,  an  iterative 
estimation  method  is  proposed  to  decrease  or  eliminate  the  window  effect. 

This  paper  reports  focus  ranging  with  less  than  1/1000  error  and  the  defocus 
ranging  with  about  1/200  error.  With  this  precision,  depth  from  focus  ranging  is 
becoming  competitive  with  stereo  vision  for  reconstructing  3D  depth  information. 


1  Introduction 

Obtaining  depth  information  by  actively  controlling  camera  parameters  is  becoming 
more  and  more  important  in  machine  vision,  because  it  is  passive  and  monocular. 
Compared  with  the  popular  stereo  method  for  depth  recovery,  this  focus  method 
doesn’t  have  the  correspondence  problem,  therefore  it  is  a  valuable  method  as  an 
alternative  of  the  stereo  method  for  depth  recovery. 

There  are  two  distinct  scenarios  for  using  focus  information  for  depth  recovery: 

•  Depth  From  Focus:  We  try  to  determine  distance  to  one  point  by  taking  many 
images  in  better  and  better  focus.  Also  called  “autofocus”  or  “software  focus”. 
Best  reported  result  is  1/200  depth  error  at  about  1  meter  distance  [14]. 

•  Depth  From  Defocus:  By  taking  small  number  of  images  under  different  lens 
parameters,  we  can  determine  depth  at  all  points  in  the  scene.  This  is  a  possible 
range  image  sensor,  competing  with  laser  range  scanner  or  stereo  vision.  Best 
reported  result  is  1.3%  RMS  error  in  terms  of  distance  from  the  camera  when 
the  target  is  about  0.9  m  away  [6]. 

Both  methods  have  been  limited  in  past  by  low  precision  hardware  and  imprecise 
mathematical  models.  In  this  paper,  we  will  improve  both: 

•  Depth  From  Focus:  We  propose  a  stronger  search  algorithm  with  its  implemen¬ 
tation  on  a  high  precision  camera  motor  system. 

•  Depth  From  Defocus:  We  propose  a  new  estimation  method  and  a  more  realistic 
calibration  model  for  the  blurring  effect. 

With  this  new  results,  focus  is  becoming  viable  as  technique  for  machine  vision  ap¬ 
plications  such  as  terrain  mapping  and  object  recognition. 


2  Depth  From  Focusing 

2.1  Introduction 

Focusing  has  long  been  considered  as  one  of  major  depth  sources  for  human  and 
machine  vision.  In  this  section,  we  will  concentrate  on  the  precision  problem  of 
focusing,  we  will  approach  high  precision  from  both  software  and  hardware  directions, 
namely,  stronger  algorithms  and  more  precise  camera  system. 

Most  previous  research  on  depth  from  focusing  concentrated  on  developments  and 
evaluations  of  different  focus  measures,  such  as  [17,  7,  9,  15].  Among  them,  [15]  pro¬ 
vided  a  theory  for  evaluating  various  focus  measures  based  on  OTF( Optical  Transfer 
Function)  instead  of  experimental  evaluations  which  could  be  biased  by  specific  scenes 
used  for  experiments.  As  described  by  all  these  researchers,  an  ideal  focus  measure 
should  be  unimodal,  monotonic,  and  should  reach  the  maximum  only  when  the  im¬ 
age  is  focused.  But  as  pointed  out  in  [18,  15],  the  focus  measure  profile  has  many 
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local  maxima  due  to  noises  and/or  the  side-lobe  effect  ([15])  even  after  magnification 
compensation  ([18]).  This  essentially  requires  a  more  complicated  peak  detection 
method  compared  with  the  Fibonacci  search  which  is  optimal  under  the  unimodal 
assumption  as  in  [1,  7].  In  this  paper,  we  use  a  recognized  focus  measure  from  the 
literature,  which  is  the  Tenengrad  with  zero  threshold  in  [7]  or  Af2  method  in  [15]. 
Our  major  concern  is  to  discover  to  what  extent  the  precision  of  focus  ranging  can 
scale  up  with  more  precise  camera  systems  and  more  sophisticated  search  algorithms. 
We  knew  the  popular  Fibonacci  search  method  would  not  be  enough  when  we  learned 
that  being  trapped  in  local  maxima  is  the  major  cause  of  focusing  error.  Instead,  We 
propose  the  combination  of  Fibonacci  search  and  curve  fitting  to  detect  the  peak  of 
focus  measure  profile  precisely  and  quickly. 

To  evaluate  the  results  from  peak  detections,  an  error  analysis  method  is  presented 
to  analyze  the  uncertainty  of  the  peak  detection  in  the  motor  count  space,  and  to 
convert  the  uncertainty  in  the  motor  count  space  into  uncertainty  in  depth.  We 
compute  the  variance  of  motor  positions  resulted  from  peak  detections  over  equal 
depth  targets.  The  Rayleigh  criterion  of  resolution  is  applied  to  the  distribution  of 
motor  positions  to  calculate  the  minimal  differentiable  motor  displacement.  With 
the  assumption  of  local  linearity  of  the  mapping  from  the  motor  count  space  to  focus 
depth,  the  minimal  differentiable  motor  displacement  is  converted  to  the  minimal 
differentiable  depth. 

The  lack  of  high  precision  equipment  has  been  a  limiting  factor  to  previous  imple¬ 
mentations  of  various  focus  ranging  methods.  Many  implemented  systems,  such  as 
SPARCS,  have  fairly  low  motor  resolution,  which  actually  prohibits  more  precise  re¬ 
sults.  We  will  give  brief  description  of  the  motor-driven  camera  system  in  Calibrated 
Imaging  Lab  later,  and  further  details  can  be  found  in  [19]. 

2.2  Fibonacci  Search  and  Curve  Fitting 

When  the  focus  motor  resolution  is  high,  we  usually  have  a  very  large  parameter 
space  which  prevents  us  from  exhaustively  searching  all  motor  positions.  Based  on 
the  unimodal  assumption  of  focus  measure  profile,  Fibonacci  search  was  employed  to 
narrow  the  parameter  space  down  to  the  peak  [1]. 

Assume  the  initial  interval  is  [x,y],  and  we  know  the  focus  measure  profile  is 
unimodal  in  this  interval,  if  x  <  x\  <  x2  <  y  and  F(xi)  <  F(x2),  where  F  is  the 
focus  measure  function,  then  the  peak  can  not  be  within  interval  [x,  xi),  otherwise  the 
unimodal  assumption  will  be  violated.  Therefore,  if  we  can  properly  choose  x\  and 
x2,  the  peak  can  be  found  optimally.  Fibonacci  search  is  the  optimal  search  under 
the  unimodal  assumption. 

Figure  1  shows  the  target  used  for  testing  the  focus  measure,  and  Figure  2  is  the 
focus  measure  profile  of  the  target. 

It  is  clear  from  Figure  2  that  Fibonacci  search  will  fail  to  detect  the  peak  precisely 
because  of  the  jagged  profile.  Fortunately,  those  local  maxima  are  small  in  size,  and 
therefore  can  be  regarded  as  disturbances.  From  previous  paragraphs,  we  know  that 
the  Fibonacci  search  only  evaluates  at  two  points  within  the  interval,  which  gives 
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Figure  1:  Step  Edge  Image  as  Target 

rise  to  the  hope  that  when  the  interval  is  large,  Fibonacci  search  is  still  applicable 
because  it  will  overlook  those  small  ripples. 

As  the  search  goes  on,  the  interval  becomes  smaller  and  smaller.  Consequently, 
Fibonacci  search  must  be  aborted  at  some  point  when  the  search  might  be  misleading. 
We  can  experimentally  set  up  a  threshold,  when  the  length  of  the  interval  is  less 
than  the  threshold,  Fibonacci  search  is  replaced  by  an  exhaustive  seaxch.  After  the 
exhaustive  search,  a  curve  is  fitted  to  the  part  of  profile  resulting  from  the  exhaustive 
search. 

In  our  experiments,  we  set  the  threshold  to  be  5  motor  counts.  So  when  the 
Fibonacci  search  narrows  down  the  whole  motor  space  to  [a,  6],  where  b  —  a  <  5,  an 
exhaustive  search  is  fired  on  the  interval  [a  —  c,b  +  c],  where  c  is  a  positive  constant. 
A  Gaussian  function  is  fitted  to  the  profile  in  the  interval  [a  —  c,  b  +  c]  using  the  least 
square  method  described  in  [11]. 

Figure  3  shows  the  result  when  Fibonacci  search  alone  is  applied  to  the  focus 
measure  profile.  Apparently,  the  search  is  trapped  in  a  local  maximum.  Figure  4 
shows  the  result  from  Gaussian  function  fitting.  Both  graphs  show  only  a  part  of  the 
whole  motor  space. 

2.3  Error  Analysis 

The  depth  error  from  focus  ranging  generally  results  from  two  sources:  error  in  peak 
detection  and  error  in  mapping  from  camera  parameters  to  range.  Here,  we  are 
primarily  concerned  with  the  error  in  peak  detection,  because  in  our  experiments,  the 
repeatability  and  accuracy  of  the  camera  motor  system  is  so  high  that  the  error  of 
the  mapping  calibration  can  be  regarded  as  negligible  compared  with  the  error  from 
the  peak  detection. 

The  major  sources  of  error  in  peak  detection  are: 

•  Image  Noise :  Since  the  focus  criterion  is  essentially  a  gradient  operator,  high 
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Figure  2:  Focus  Measure  Profile 

frequency  noise  may  cause  a  problem.  In  our  experiments,  this  is  the  major 
limiting  factor. 

•  Image  Shift:  Focusing  involves  changes  of  camera  parameters,  which,  in  turn, 
may  cause  image  misregistrations.  There  are  two  major  kinds  of  image  misreg¬ 
istrations:  one  is  the  magnification  change  due  to  different  effective  focal  length, 
the  other  one  is  the  image  shift  caused  by  the  imperfection  of  the  optical  system, 
for  example,  if  the  axes  of  lenses  are  not  exactly  collinear,  changing  camera  pa¬ 
rameters  may  not  only  cause  a  magnification  change  of  the  image,  but  also  will 
shift  the  whole  image  to  the  extent  of  several  pixels  [18]. 

•  Search  Strategy.  As  explained  earlier,  naive  sear'h  might  be  trapped  in  local 
maxima. 

•  Content  of  Image:  Since  the  focus  measure  is  essentially  a  high-pass  filter,  it  re¬ 
quires  that  the  content  within  the  evaluation  window  have  enough  high  frequency 
components.  In  the  extreme  case,  a  uniform  intensity  pattern  will  provide  no 
information  about  depth  at  all.  Quantitatively,  more  high  frequency  components 
mean  a  sharper  peak  in  Figure  2,  and  therefore,  more  precise  peak  detection. 

As  explained  earlier,  we  are  concerned  about  the  error  from  searching,  so  other 
sources  of  error  should  be  identified  and  minimized.  Image  magnification  changes  can 
be  compensated  by  camera  calibration  as  suggested  in  [4,  18].  To  tolerate  the  image 
shift  caused  by  optic  system,  we  used  a  fairly  large  window  40x40.  Because  we  use 
a  target  as  in  Figure  1,  the  image  magnification  change  and  the  image  shift  can  be 
ignored  if  the  edge  is  in  the  middle  of  the  window.  Therefore,  in  our  experiments, 
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we  didn’t  use  the  magnification  compensation.  And  the  content  of  image  certainly 
includes  enough  high  frequency  components. 

Because  of  the  depth  accuracy  we  expected,  a  direct  measurement  of  absolute 
depth  is  impossible.  Instead,  we  prefer  to  use  the  minimal  differentiable  depth  as  an 
indication  of  depth  accuracy.  If  we  assume  the  peak  motor  positions  resulting  from  the 
same  repeated  experiments  t>ave  a  Gaussian  distribution,  we  can  define  the  minimal 
differentiable  motor  displacement  as  the  minimal  difference  of  two  motor  counts  which 
have  pre-defined  probability  of  representing  different  peaks.  For  example,  in  Figure  5, 
the  Gaussian  distribution  is  artificially  cut  at  90%  line,  so  we  can  say,  if  we  do  one 
focus  ranging  experiment  on  a  target  at  the  depth  corresponding  to  the  peak  a,  there 
is  a  90%  of  probability  the  motor  count  will  be  within  the  interval  A. 

There  can  be  different  pre-defined  probability  for  the  definition  of  minimal  differ¬ 
entiable  motor  displacement.  Table  1  is  a  list  of  probabilities  and  the  corresponding 
motor  displacements  in  terms  of  the  Gaussian  constant  a.  We  define  the  minimal 
differentiable  motor  displacement  based  on  Rayleigh  criterion  for  resolution  [2]  which 
specifies  the  saddle-to-peak  ratio  as  8/7 r2.  In  case  of  the  Gaussian  distribution,  the 
cut-off  line  corresponding  to  the  Rayleigh  criterion  is  about  0.9cr. 

There  is  a  mapping  from  a  motor  count  to  an  absolute  depth  value  definitely. 
Assume  d  =  f(m)  where  d  is  the  depth,  m  the  motor  count  and  /  the  mapping,  we 


A  d 

Am  =  «")■ 


where  f'{m )  is  the  first  order  derivative  with  respect  to  m.  Because  what  we  really 
want  to  know  is  the  minimal  differential  depth  or  depth  resolution  Ad,  and  we  already 
have  the  minimal  differentiable  motor  displacement  Am,  the  only  thing  need  to  be 
calibrated  is  f(m).  If  we  assume  f'{m)  is  a  constant  in  the  vicinity  of  d  =  D,  and 
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Table  1:  Different  Definitions  of  Minimal  Differentiable  Motor  Displacements 


Cut-off  Line  of  Gaussian 

Probability  of  correctness 

0.68a 

50% 

0.84a 

60% 

1.03a 

70% 

1.28a 

80% 

1.64a 

90% 

the  motor  count  distribution  has  its  center  at  m  =  M,  then  when  the  target  is  moved 
A D,  the  distribution  center  moves  AM,  and  we  will  have  the  minimal  differentiable 
depth 

(2) 

where  Am  is  the  minimal  differentiable  motor  displacement. 


2.4  Implementation  and  Result 

2.4.1  Hardware 

We  implemented  this  focus  ranging  algorithm  in  the  Calibrated  Imaging  Laboratory, 
using  the  Fujinon/Photometric  camera  system  [19].  This  system  consists  of  a  Fujinon 
ENG  zoom  lens  mounted  on  a  Photometries  Star  I  scientific  camera.  The  camera  can 
provide  a  12-bit  per  pixel  greyscale  image.  Color  band  can  be  selected  by  the  filter 
wheel  mounted  in  fronf  of  the  lens. 
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Figure  5:  Minimal  Differentiable  Motor  Displacement 

The  focal  length  can  change  between  10  mm  to  130  mm  with  11100  motor  steps, 
and  the  focus  distance  can  change  from  approximately  1  meter  to  infinity  with  5100 
motor  steps,  the  aperture  can  change  from  FI. 7  to  completely  closed  with  2700  motor 
steps.  The  SNR  of  the  camera  can  be  as  low  as  400/1  because  of  the  pixel  by  pixel 
digitization  and  the  -40°  C  temperature  of  the  sensor. 

2.4.2  Experiments  and  Results 

We  put  the  target  of  Figure  1  at  about  1.2  meters  away  from  the  front  lens  element  of 
the  camera.  Maximal  focal  length  and  maximal  aperture  are  employed  to  achieve  the 
minimal  depth  of  field.  The  evaluation  window  is  40x40,  while  the  gradient  operator 
is  a  3x3  Sobel  operator. 

The  distribution  of  motor  positions  are  sketched  in  Figure  6  resulting  from  an 
experiment  repeated  40  times.  With  the  mean  as  the  center  of  a  Gaussian,  and  the 
standard  deviation  as  a  of  the  Gaussian,  we  have  the  minimal  differentiable  motor 
displacement  as  2  x  0.9  a  =  4.5  motor  counts. 

Then  the  target  is  moved  toward  the  camera  1  centimeter,  and  we  repeated  the 
above  experiments.  The  center  of  the  motor  count  distribution  moves  38  counts. 
Therefore,  by  Eq.  2,  we  have  the  minimal  differentiable  depth: 

A  Yu  A.  ^ 

Ad  =  7AZ?  =  ~x  1cm  =  0.118cm.  (3) 

AM  38  v 

And  the  relative  depth  error  is  about  0.118  /  120  =  0.098%. 
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Probability 


Ratafra  Motor  Position 

Figure  6:  Motor  Position  Distribution 

3  Depth  From  Defocusing 

3.1  Introduction 

The  depth  from  defocusing  method  uses  the  direct  relationships  among  the  depth, 
camera  parameters  and  the  amount  of  blurring  in  images  to  derive  the  depth  from 
parameters  can  be  directly  measured.  In  this  part  of  the.  paper,  we  propose  an 
iterative  method  to  estimate  the  amount  of  defocusing  accurately,  and  a  calibration- 
based  blurring  model. 

Because  the  blurring  in  an  image  can  be  caused  by  either  the  imaging  process  or 
the  scene  itself,  it  generally  requires  at  least  two  images  taken  under  different  camera 
configurations  to  eliminate  this  ambiguity.  Pentland  solved  this  problem  by  taking  one 
picture  by  a  pin-hole  camera,  which  can  be  regarded  as  the  orthographic  projection 
of  the  scene  with  zero  imaging  blurring,  and  another  one  by  a  wide  aperture  camera 
[10].  In  this  paper,  we  intend  to  employ  two  images  which  axe  defocused  to  different 
extents. 

Window  effects  have  largely  been  ignored  in  the  literature  of  this  field,  except 
[6,  5],  where  the  author  derived  a  function  of  RMS  depth  error  in  terms  of  the  size 
of  window.  For  example,  when  the  window  is  4  pixels  by  4  pixels,  the  RMS  error 
from  the  window  effect  can  be  as  large  as  65.8%!  The  iterative  method  we  propose 
is  capable  of  eliminating  the  window  effect.  It  is  also  noticed  that  the  size  of  the 
window  is  the  decisive  factor  that  limits  the  resolution  of  depth  maps  if  we  try  to 
obtain  a  dense  depth  map.  Therefore  if  we  can  use  smaller  window  without  reducing 
the  quality  of  the  results,  the  resolution  of  dense  depth  maps  can  be  much  higher. 

Previous  work  has  employed  oversimplified  camera  models  to  derive  the  relation¬ 
ship  between  blurring  functions  and  camera  configurations.  In  [10,  13,  3],  the  radius 
of  blurring  circles  are  derived  from  the  ideal  thin  lens  model.  In  this  paper,  we  will 
propose  a  more  sophisticated  function  which  directly  relates  the  blurring  function 


8 


with  camera  motors.  Experimental  results  are  very  consistent  with  this  model  as  to 
be  shown  later. 

3.2  Basic  Theory 

The  depth  from  defocus  method  is  based  on  the  idea  that,  the  amount  of  blurring 
change  is  directly  related  to  the  depth  and  camera  parameters.  Since  the  camera 
parameters  can  be  calibrated,  the  depth  can  be  expressed  by  the  amount  of  blurring 
change  correspondingly.  To  estimate  the  amount  of  blurring  change,  we  need  a  model 
of  the  optical  blurring.  Traditionally,  the  blurring  effect  is  modeled  as  the  convolution 
of  Gaussian  in  computer  vision  literature,  partly  due  to  its  mathematical  tractability. 
Here  we  will  still  assume  a  Gaussian  model,  i.e.: 


•+oo  f+OO 


I(x,y)=  /  Io(Z,v)9*(d,c)(x-Z,y-ri)d(dTi 

J  —CO  J —oo 

(4) 

,  s  1 

(5) 

where  I(x,y )  is  the  intensity  image,  d  is  the  depth  of  the  object,  c  is  the  vector  of 
camera  parameters,  gG{d,c)  is  the  blurring  function. 

For  simplicity,  we  can  instead  consider  the  one  dimensional  case,  in  which  Eq.  4 
and  Eq.  5  become 

r+ oo 

/(*)  =  /  -  CK 

J —oo 

(6) 

f  \  1  —4- 

9c(x)~  /—  e  ^ 

v27 r<7 

(7) 

The  basic  idea  of  the  depth-from-defocus  method  is  that,  in  Eq.  6,  since  the  I(x) 
which  is  the  image  and  c  which  results  from  camera  calibration  are  known,  and  d  and 
Io(x)  are  unknown,  we  can  take  two  images  under  different  camera  settings  ct  and 
eg,  then  at  least  theoretically  d  can  be  computed. 

But  since  the  Eq.  6  is  not  a  linear  equation  with  respect  to  unknown  d ,  directly 
solving  for  d  is  either  impossible  or  numerically  unstable.  Pentland  proposed  a  method 
to  solve  d  by  Fourier  transforms  [10]: 

If 

/j(x)  =  h(x)  *  gai  (x) 

(8) 

h{x)  =  h{x)*g<72{x) 

(9) 

then, 

,  ^[/,(x)S  I ,(/)  .X„( /)&.(/)  .&,(/)  l.y, 

ln  */,(,)]  - ln  Uf)  -  lnMf)S„(f)  - la  <w/)  -  (°' ' 

-°l)  (10) 
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oi  =  a(d,  c, ) 


(11) 


cr2  =  a(d,cs)  (12) 

Replacing  Eq.  11  and  Eq.  12  into  Eq.  10,  we  can  get: 

lnWi  =  (13> 

where  the  function  a  can  be  calibrated. 

Obviously,  in  Eq.  13,  the  only  unknown  is  d ,  therefore,  depth  recovery  from  two 
images  is  straightforward. 

3.3  Gabor  Transform  and  Window  Effect 

The  method  explained  above  is  based  on  J-[I(x)\,  which  is  the  Fourier  transform 
of  the  entire  image,  Thus,  only  one  d  can  be  calculated  from  the  entire  image.  If 
our  goal  is  to  obtain  a  dense  depth  map  d(x,y),  we  axe  forced  to  use  the  STFT 
(Short  Time  Fourier  Transform)  to  preserve  the  depth  locality.  To  eliminate  the 
spurious  high  frequency  components  generated  by  the  discontinuity  at  the  window 
boundary,  people  usually  multiply  the  window  by  a  window  function.  Unfortunately, 
the  elegant  cancellation  in  Eq.  10  doesn’t  hold  any  more  if  we  introduce  the  window 
function  W{x ): 

XI IMW(X)}  X,(/)  «  W(f)  (2,(/)g.,(/))  ♦  W(/) 

^(x)H'(x)]  I2(/)»W(/)  (Z,(/)S„(/))  *  W(/) 

The  convolutions  in  Eq.  14  introduce  blurring  in  both  the  time  (space)  and  fre¬ 
quency  domains.  The  Gabor  transform  [12],  which  uses  a  Gaussian  as  the  window 
function,  can  minimize  the  production  of  spatial  uncertainty  and  spectral  uncertainty. 
Assuming  a  Gaussian  function  is  used  as  W(x)  =  <j„(x),  and  its  Fourier  transform  is 
W(/)  =  e~°2^2/2,  we  have: 


Af2_//2|W(/)l2d/_  1 

7  /  I  W(Z)  |2  df  2 a* 

a  2  =  5^\W{x)  1 2d/  02 

/  I  W{x)  I2  df  2 

The  above  equations  mean  that,  in  the  frequency  domain,  two  frequency  compo¬ 
nents  A /  away  can’t  be  discriminated,  and  in  the  space  domain,  two  impulses  Ax 
away  can’t  be  discriminated  either.  Apparently,  one  interpretation  of  why  Eq.  10 
doesn’t  hold  is  that  A /  is  not  zero.  As  o  approaches  infinity,  A /  approachs  zero. 
In  other  words,  as  the  window  function  approaches  a  constant  function,  the  Gabor 
transform  becomes  more  and  more  accurate  in  the  frequency  domain,  but  on  the 
other  hand,  less  and  less  accurate  in  the  space  domain  as  indicated  in  Eq.  16. 
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3.4  Maximal  Resemblance  Estimation 


From  observation,  we  know  that  when  <j\  is  approaching  <72,  Eq.  14  also  approaches 
zero,  in  other  words,  when  <7i  almost  equals  a2,  the  Eq.  10  can  be  a  good  approxi¬ 
mation  in  terms  of  absolute  error.  This  observation  suggests  an  iterative  method  in 
which  the  blurring  difference  A  is  refined  by  blurring  one  image  to  resemble  the  other 
in  the  vicinity  of  one  pixel.  In  symbols:  (Assuming  A(*)  is  the  the  kth.  estimation  of 

-  <4) . 


1.  l[0)  =  Iu  40)  =  I2  and  A  =  0.0,  k  =  0; 

2.  l[k)  =  f[I(k)W}, 
l(k)  =  F{l¥]W\. 


7<*) 

3.  Fit  a  curve  to  In  =  —  /2A(*)/2.  (Refer  to  Eq.  10) 

■^2 

4.  A  =  ZLo  A(0- 

5.  If  A  >  0,  then 

r(fc+i)  _  t  . 

4‘+1>  =  /2’.G„v3; 

else, 

r(*+l)  _  r  *  n _ 

r(fc+1)  _  t  . 

•*2  —  12i 

Note  all  these  convolutions  are  done  very  locally  because  of  the  window  function 
multiplication  in  step  2. 

6.  If  the  termination  criteria  are  satisfied,  exit. 

7.  k  =  k+1,  go  to  step  2. 

All  above  operations  involve  only  local  pixels,  and  don’t  require  taking  new  pic¬ 
tures.  Therefore,  the  computation  can  be  done  in  parallel  to  all  pixels  to  obtain  a 
dense  depth  map. 

Let’s  trace  the  above  iterations,  at  the  first  cycle,  we  have  (Assuming  >  cr2): 


A(o)  —  (&i  ~  crj)  +  Em 
while  E(0)  is  the  error  of  estimating  crj  —  a\. 

ll"  =  h  =  l o*G', 

We  can  see,  after  the  first  iteration,  we  actually  switched  to  estimate  E( o).  So  after 
k  iterations,  we  have 
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(17) 


k 

A  =  ^  A  (,)  =  o\  —  a\  +  E(k) 

!=0 

Now  the  problem  is  whether  the  sequence  E^)(k  =  0,1,2,...)  converges  to  zero. 
Unfortunately,  there  is  no  way  to  prove  this  mathematically  because  it  depends  on  the 
fitting  method  used  in  step  3.  Notice  that  if  in  step  3,  we  get  an  estimate  of  a\  —  a\ 
only  based  on  one  particular  frequency  component,  E(k)  may  diverge.  Previous  depth 
from  defocus  methods  usually  counted  on  a  pre-selected  frequency  band,  such  as  in 
[10],  sometimes  this  may  cause  a  very  large  error  if  there  is  not  enough  energy  of  the 
image  content  within  the  frequency  band. 

3.5  Fitting  Algorithm 

Common  to  any  frequency  analysis,  we  need  a  robust  algorithm  to  extract  a\  —  er2  in 
Eq.  10  in  a  noisy  environment.  Ignoring  the  phase  information  resulting  from  Gabor 
transform,  Eq.  10  becomes: 

|  Xa(/)  |2  _  2  2 

|J2(/)|2'  2)  (18) 

Assuming  an  additive  white  noise  model,  we  have: 

In  1 j]  I"1  =  ln(|  Uf)  |2  +».)  -  Ml  US)  |2  +»2),  (19) 

I  M\J)  \  +n2 

where  nj  and  n2  are  energy  of  noises.  Because  ln(x  +  dx)  ~  In  x  +  \dx,  if  we  assume 
I  Ii (/)  |2^>  n\  and  |  J2 (/)  |2»  n2,  Eq.  19  can  be  approximated  as: 

ln(|  Ji(/)  |2  +ni)  -ln(|  J2(/)  |2  +n2)  «  In  ^  |2  +  (j  Jj(/)  |2  -  |  j 2{f)  |2) 

.  '  . .  '  (20) 

Therefore,  at  each  frequency,  the  left  hand  of  Eq.  18  can  be  approximated  by  di¬ 
viding  corresponding  spectral  energy  of  two  images  at  the  specific  frequency,  provided 
that  the  energy  values  are  much  larger  than  the  energy  of  noise.  The  deviation  of 
this  approximation  can  be  expressed  as: 

c' =  *  (|:r>(/)  I2  +  I W)  I2)  (21) 

where  is  a  constant  related  to  the  noise  energy  of  the  camera. 

Certainly,  Eq.  21  is  an  approximation  to  model  the  error  distribution  as  an  Gaus¬ 
sian.  As  an  intuition,  when  j  X\ (/)  |2  or  |  J2 (/)  |2  is  large,  i.e.  the  energy  within  the 
frequency  is  high,  the  deviation  is  small,  and  vice  versa. 

From  Eq.  18,  we  know  this  estimation  problem  is  a  typical  linear  regression  prob¬ 
lem.  With  the  uncertainty  measurement  approximated  in  Eq.  21,  a\  —  a\  can  be 
estimated  robustly.  More  details  can  be  found  in  [11]. 
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There  is  one  more  problem  need  to  be  addressed.  Since  we  obtain  images  under 
different  camera  configurations,  the  total  energy  within  the  image  is  different.  Usually, 
a  brightness  normalization  is  performed  to  every  image  before  Gabor  transforms,  as 
in  [16].  But  since  this  normalization  will  have  different  effects  over  the  noise  in  two 
images,  which  will  complicate  the  uncertainty  analysis,  we  prefer  not  to  normalize 
the  brightness  in  two  images.  Instead  we  assume: 

7i0)  =  ci/0(x)  *  gai  (x)  (22) 

I2(x)  =  c2/0(x)  *  g„2{x),  (23) 

where  c\  and  C2  are  two  unknown  constants.  Replacing  the  two  equations  into  Eq.  18, 
we  have: 


In 


US) 

US) 


=  - / Vi  -  °\)  +  2 In— , 


c2 


Cl 


(24) 


which  is  still  a  linear  problem,  while  the  uncertainty  analysis  still  holds. 


3.6  Blurring  Model 

Since  the  defocus  ranging  method  derives  the  depth  instead  of  searching  for  the  depth, 
it  requires  a  direct  modeling  of  defocusing  in  terms  of  camera  parameters  and  depth. 
Previous  researchers  usually  derived  the  relation  among  lens  parameters,  the  depth 
and  the  blurring  radius,  such  as  in  [10,  13].  For  example,  in  [10],  by  simple  geometric 
optics,  Pentland  derived  the  formula: 


D  = 


Fvp 

vq  —  F  —  akf 


(25) 


where  D  is  the  depth,  F  the  focal  length,  /  the  /-number  of  the  lens,  u0  the  distance 
between  lens  and  image  plane,  a  the  blurring  circle  radius,  and  k  a  constant. 

The  basic  limitation  of  this  approach  is  that  those  parameters  are  based  on  the 
ideal  thin  lens  model  and  in  fact,  they  can  never  be  measured  precisely  on  any  camera. 
We  desire  a  function  which  is  in  terms  of  motor  counts,  which  are  measurable  and 
controllable.  For  instance,  if  we  use  mz  for  zoom  motor  count,  mj  for  focus  motor 
count,  and  ma  for  aperture  motor  count,  we  wish  to  get  a  function  in  the  form  of 
j D  =  F(mz,my,ma,cr)  or  a  =  F(mz,mj,ma,  D).  Since  there  is  a  depth  ambiguity 
in  the  former  form  ([10],  Appendix),  we  prefer  to  express  the  blurring  radius  a  as  a 
function  of  motor  counts  and  the  depth. 

From  Eq.  25,  we  can  express  a  as: 


v0-  F  Fvo/kf 
a~  kf  D 


(26) 


Since  all  the  lens  parameters  can  be  thought  as  derived  from  motor  counts,  we  can 
rewrite  Eq.  26  as: 
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(27) 


<r  =  ki(mz,mf,ma )  + 


k2(mz,mf.ma) 

D  +  &3(m2,m/,ma) 


Notice  that  there  is  another  term  £3  added.  Because  D  is  the  distance  between  lens 
and  object,  and  the  position  of  lens  changes  as  camera  parameters  are  changed,  we 
intend  to  use  a  fixed  plane  perpendicular  to  the  optical  axis  as  the  depth  reference 
plane  at  2  =  £3.  From  now  on,  we  always  refer  to  depth  as  the  distance  between  the 
target  and  the  depth  reference  plane. 

Eq.  27  says,  at  some  point,  a  will  drop  to  zero,  which  is  the  best  focused  point.  But 
we  knew  that  we  can  never  get  a  real  step  edge,  because  there  is  always  high  frequency 
loss  in  the  imaging  process.  It  can  be  attributed  to  the  pixel  quantization,  diffraction, 
etc.  Similarly,  we  can  model  this  as  a  convolution  with  a  Gaussian  independent  of 
the  geometric  blurring  which  we  already  modeled.  We  use  k4  to  model  its  width. 

Since  two  consecutive  convolutions  with  Gaussians  are  equivalent  with  one  Gaus¬ 
sian  convolution: 

G°  1  *  G°2  = 

we  have  our  final  blurring  model  expression  as: 


a 


^fci(m„m/,ro.)  + 


k2{mz,mf,ma) 

D  +  k3(mz,  m/,  ma) 


2 

+  kl(mz,mf,  ma) 


(29) 


3.7  Implementation  and  Results 

3.7.1  Simulation 

Our  first  simulation  examines  how  precise  the  estimate  of  a\  —  a\  can  be.  We  use 
step  function  as  /0,  and  convolve  it  with  two  different  Gaussian  GCl  and  Ga2 .  The 
window  function  is  also  a  Gaussian  with  cr  equal  to  three  pixel  widths.1  From  Eq.  16, 
the  locality  of  the  window  function  is  about  2  pixel  widths. 

The  result  of  the  iterative  method  is  illustrated  in  Fig.  7.  And  we  can  see  that, 
when  the  window  function  is  narrow,  how  poor  the  first  estimation  can  be.  Experi¬ 
mentally,  the  final  error  is  lower  bounded  by  the  discretization  of  the  functions,  that 
is,  when  the  a  of  the  Gaussian  function  is  too  small  with  respect  to  the  pixel  width, 
the  discrete  Gaussian  is  no  longer  a  good  approximation  of  the  real  Gaussian  function, 
and  the  results  begin  to  degenerate. 

Figure  8  shows  the  absolute  error  of  the  estimating  a\  —  a\  in  the  absence  of  noise. 
Generally,  the  errors  are  less  than  1/1000  of  true  values.  Remember  that  in  those 
figures,  the  blurring  radius  a  due  to  defocus  can  be  even  much  larger  than  the  window 
size! 

Tn  this  report,  all  a  values  are  in  pixel  width. 
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Normalized  Intensity 


Figure  9:  Edge  Bleeding 


3.7.2  Variation  Measure  and  Thresholding 

Certainly,  if  there  is  no  texture  or  little  texture  within  the  image,  we  can  not  expect 
to  obtain  accurate  estimation  of  depth.  Therefore,  we  need  a  measure  which  can 
discriminate  image  patchs  with  enough  texture  from  those  without  enough  texture. 
Another  reason  why  we  need  a  variation  measure  is  because  of  the  so-called  edge 
bleeding  [8].  For  example,  in  Figure  9  case,  where  there  is  a  step  edge  blurred  by 
different  amounts,  apparently,  image  No.  1  contains  less  high  frequency  components 
than  image  No.  2  does,  but  if  the  window  is  at  A,  the  image  variation  of  A  in  image 
No.  1  is  larger  that  that  in  image  No.  2,  and  therefore  wrong  results  will  be  deducted. 

Assuming  the  image  patch  is  we  have  the  variation  measure  expressed  as 

in  Eq.  30.  Applying  it  to  a  real  image,  we  have  the  variation  map  as  in  Figure  10, 
which  properly  quantifies  the  variation  content  within  neighbors  of  pixel. 

(30) 

To  better  illustrate  the  relation  between  the  variation  measure  and  the  result  of 
estimate  of  a\  —  <r|.  Figure  11  shows  the  selection  of  the  threshold  to  exclude  the 
effects  of  the  low  variation  content  and  the  edge  bleeding. 

3.7.3  Calibration  of  Blurring  Function 

First,  we  tried  to  confirm  our  assumption  that  the  blurring  function  can  be  approx¬ 
imated  by  the  Gaussian  function.  Ideally,  if  we  have  a  point  light  source,  the  image 
of  this  light  source  should  be  the  blurring  function  because 

8{x)  *  F{x)  =  F(x).  (31) 
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Figure  10:  Intensity  Image  and  Variation  Map 


Norma  toad  Intensity 


tmaga  No.  1 
ImagaNo.  2 
Variation  Maasura 
Eatimaiaot  o*-<£ 


Pbta*  Po«rt»on 


Figure  11:  Variation  Measure  and  Threshold 


Due  to  the  technical  difficulty  of  an  ideal  point  light  source,  a  step  edge  image  is  used 
instead  as  shown  in  Figure  1.  Assume  the  step  function  is  u(i),  the  image  of  the  step 
edge  should  be 

r+oo  2  (j— t)2  /  X  \ 

ga(x)  *  u(x)  =  /  ~7-=-e~  a*2  cfo  =  ci  +  c2  Erf  -7=-  (32) 

Jo  v27T(7  \V2  cr  J 

where  Erf  is  the  error  function,  c\  and  c?  are  two  constants. 

The  Figure  12  illustrates  the  least  square  fitting  results  for  a  blurred  step  edge. 


Figure  12:  Blurring  Function  Fitting 


3.7.4  Blurring  Model  Calibration 

The  coefficients  ki,k2,k3,k4  are  constants  in  Eq.  29  when  motors  are  fixed.  We  can 
move  the  calibration  target  over  four  different  places,  and  assume  at  the  first  place, 
the  depth  of  the  target  is  zero  (Note  the  depth  is  w.r.t.  the  reference  plane),  we 
will  have  four  non-linear  equations  with  four  unknowns.  To  suppress  noise,  we  can 
measure  at  more  than  four  places  and  fit  the  blurring  model  to  the  results. 

Using  the  rail  table  in  CIL  ([19]),  the  whole  process  of  calibration  can  be  auto¬ 
mated.  The  target  moves  from  about  1.5  meter  from  the  camera  to  about  3.5  meters, 
and  the  blurred  edges  are  fed  to  the  least  square  fitting  described  in  the  previous 
section,  the  resulting  <r’s  are,  in  turn,  fitted  against  the  model  expressed  in  Eq.  29. 

Experiments  have  shown  very  consistent  results  with  the  model  as  illustrated  in 
Figure  13.  The  target  is  moved  from  far  to  near,  at  the  furthest  distance  the  rail 
motor  position  is  zero.  And  when  it  moved  through  the  whole  range  of  the  rail,  the 
blurring  circle  first  becomes  smaller  and  smaller,  then  after  a  point,  it  becomes  larger 
and  larger.  It  is  very  clear  that  the  this  effect  can  be  well  modeled  by  Eq.  29. 
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Figure  13:  Blurring  Model 
3.7.5  <r2-Map  and  Shape  Recovery 

The  first  step  toward  a  dense  depth  map  is  to  compute  cr2  =  a2  —  a2,  without  loss 
of  generality  we  assume  &i  >  a2,  for  every  pixel,  using  the  maximal  resemblance 
estimation.  In  Figure  14,  we  bent  a  sheet  of  A4  paper  in  different  directions  about 
1.0  inchs  and  took  images.  The  target  is  about  100  inchs  away  from  the  camera.  The 
focal  length  is  130mm,  the  f-number  is  f/4.7  for  (a)  and  (c),  f/8.1  for  (b)  and  (d). 

Then  we  recover  <72-map  for  those  two  objects.  The  rectangle  in  Figure  14  is  the 
area  for  <r2-map.  The  aw  for  Gabor  transform  is  5.0  pixel  size.  Figure  15  shows  the 
rr2-map  recovery  based  on  the  images  in  Figure  14.  The  holes  within  the  a2  maps  are 
those  patchs  without  enough  texture. 

Compared  with  the  a2  map  recovery  without  iterative  maximal  resemblance  esti¬ 
mation  showed  in  Figure  16,  we  can  see  that  results  without  iteration  are  much  more 
noisy. 

With  a2  map  recovered  and  the  coefficients  in  Eq.  29  calibrated  w.r.t.  the  two 
camera  configurations,  the  depth  map  recovery  is  straightforward  by  using  the  Brent’s 
method  [11]  to  numerically  solve  the  nonlinear  equation.  Figure  17  showed  the  depth 
map  of  the  convex  object  in  Figure  14  (c)  and  (d),  with  respect  to  the  depth  reference 
plane,  which  is  behind  the  object. 


4  Summary 

In  summary,  we  have  described  two  sources  of  depth  information — depth  from  foonr 
ing  and  depth  from  defocusing — separately.  In  depth  from  focusing,  we  pursued  high 
accuracy  from  both  the  software  and  hardware  directions,  and  experiments  proved 
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(a)  Concave  Object  Image  No.  1  (b)Concave  Object  Image  No.2 


(c)  Convex  Object  Image  No.  1  (d)Convex  Object  Image  No.  2 

Figure  14:  Pictures  cf  Different  Objects 
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(a)  Concave  Object 


(b)  Convex  Object 


Figure  Id-  cr2-Map  Recovery 


Figure  16:  a2-Map  Recovery  Without  Maximal  Resemblance  Estimation 
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Figure  17:  Shape  Recovery  For  the  Convex  Object 

that  a  great  improvement  was  obtained.  In  depth  from  defocusing,  we  re-examined 
the  whole  underlying  theory,  from  signal  processing  to  camera  calibration,  and  es¬ 
tablished  a  new  computational  model,  which  has  been  successfully  demonstrated  on 
both  synthesized  and  real  images. 

The  significance  of  these  works  is  two-fold.  First,  there  are  few  previous  reports 
talking  about  the  shape  from  defocusing  or  focusing,  and  the  main  reason  of  the 
inefficacy  of  shape  from  defocusing  or  focusing  is  it  low  precision.  Demonstrated  in 
this  paper,  the  improvements  on  precisions  of  focus  ranging  and  defocus  ranging  can 
lead  to  efficient  shape  recovery  methods.  Second,  it  has  been  shown  that  the  iterative 
method  proposed  in  this  paper  is  capable  of  perserving  the  depth  locality,  which  is 
also  essential  to  obtain  dense  depth  map. 
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